In this notebook, I have created a step-by-step documentation of the analysis, and I will try to comment as much as possible to make clear what I have analysed and what I think could be important results that might indicate what biology is behind the expression differences we observe.
We established that sample 47 was undefined and non-sensical in our previous analysis and is thus omitted from this analysis. Here we take the recent data (50,51,52) to analyse and understand if ablation shows a identifyable effect.
Registered S3 methods overwritten by 'htmltools':
method from
print.html tools:rstudio
print.shiny.tag tools:rstudio
print.shiny.tag.list tools:rstudio
Registered S3 method overwritten by 'data.table':
method from
print.data.table
Registered S3 method overwritten by 'htmlwidgets':
method from
print.htmlwidget tools:rstudio
Registered S3 method overwritten by 'spatstat.core':
method from
formula.glmmPQL MASS
Attaching SeuratObject
Attaching sp
Now we perform QC, looking at the percentage of mitochondrial
RNA vs other RNA, plus other metrics.
* nFeature_RNA = number of genes
* nCount_RNA = number of UMIs or Counts
* percent.mt = percent of expression of mitochondrial genes versus the
rest
# The [[ operator can add columns to object metadata. This is a great place to stash QC stats
oligos[["percent.mt"]] <- PercentageFeatureSet(oligos, pattern = "^mt-")
# Visualize QC metrics as a violin plot
VlnPlot(oligos, group.by = "Sample",features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 1,pt.size = 0.1)
The samples seem to differ QC-wise, and many samples show a wide spread for the percent of mitochondrial expression, requiring a stringent cut-off. Now we plot the QC information in another way to see if we can estimate thesholds for removing bad cells and perhaps doublets.
# FeatureScatter is typically used to visualize feature-feature relationships, but can be used
# for anything calculated by the object, i.e. columns in object metadata, PC scores etc.
plot1 <- FeatureScatter(oligos, group.by = "Sample",feature1 = "nCount_RNA", feature2 = "percent.mt",pt.size = 0.5)
plot2 <- FeatureScatter(oligos, group.by = "Sample", feature1 = "nCount_RNA", feature2 = "nFeature_RNA",pt.size = 0.5)
CombinePlots(plots = list(plot1, plot2))
Warning: CombinePlots is being deprecated. Plots should now be combined using the patchwork system.
These samples seem to be performing differently, and we have very
high percent of mitochondrial genes especially in the TC_50 sample,
which concurrently is also lower in number of genes expressed and number
of UMIs detected. Now we will remove cells expressing less that 200
genes (to remove bad cells),
and more than 3000 genes (to remove doublets). And remove cells
expressing more that 5% mitochondrial genes. And replot the QC data.
Just to alleviate any concerns, the downstream analysis does not seem to be significantly changed with more stringent or loosened cut-offs, in terms of DTA significant genes.
#Clean up the data
oligos <- subset(oligos, subset = nFeature_RNA > 200 & nFeature_RNA < 3000 & percent.mt < 15)
ncol(oligos)
[1] 9291
Now we look at the cleaned-up data.
# The [[ operator can add columns to object metadata. This is a great place to stash QC stats
oligos[["percent.mt"]] <- PercentageFeatureSet(oligos, pattern = "^mt-")
# Visualize QC metrics as a violin plot
VlnPlot(oligos, group.by = "Sample",features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 1,pt.size = 0.1)
# FeatureScatter is typically used to visualize feature-feature relationships, but can be used
# for anything calculated by the object, i.e. columns in object metadata, PC scores etc.
plot1 <- FeatureScatter(oligos, group.by = "Sample",feature1 = "nCount_RNA", feature2 = "percent.mt",pt.size = 0.5)
plot2 <- FeatureScatter(oligos, group.by = "Sample", feature1 = "nCount_RNA", feature2 = "nFeature_RNA",pt.size = 0.5)
CombinePlots(plots = list(plot1, plot2))
Warning: CombinePlots is being deprecated. Plots should now be combined using the patchwork system.
Now we normalize the dataset.
Generating the UMAP and TSNE.
oligos.integrated <- RunUMAP(oligos.integrated, dims = 1:30)
13:25:46 UMAP embedding parameters a = 0.9922 b = 1.112
13:25:46 Read 9291 rows and found 30 numeric columns
13:25:46 Using Annoy for neighbor search, n_neighbors = 30
13:25:46 Building Annoy index with metric = cosine, n_trees = 50
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
13:25:46 Writing NN index file to temp file /var/folders/wx/df1rnvs15s5434nztss_v4z00000gn/T//Rtmpz45HdB/fileaacd2c0c2ab6
13:25:46 Searching Annoy index using 1 thread, search_k = 3000
13:25:48 Annoy recall = 100%
13:25:48 Commencing smooth kNN distance calibration using 1 thread
13:25:49 Initializing from normalized Laplacian + noise
13:25:49 Commencing optimization for 500 epochs, with 390252 positive edges
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
13:26:04 Optimization finished
oligos.integrated <- RunTSNE(oligos.integrated, dims = 1:30)
plots <- DimPlot(oligos.integrated, group.by = c("Sample"), combine = FALSE)
plots <- lapply(X = plots, FUN = function(x) x + theme(legend.position = "top") + guides(color = guide_legend(nrow = 3,
byrow = TRUE, override.aes = list(size = 3))))
CombinePlots(plots)
Warning: CombinePlots is being deprecated. Plots should now be combined using the patchwork system.
plots <- TSNEPlot(oligos.integrated, group.by = c("Sample"), combine = FALSE)
plots <- lapply(X = plots, FUN = function(x) x + theme(legend.position = "top") + guides(color = guide_legend(nrow = 3,
byrow = TRUE, override.aes = list(size = 3))))
CombinePlots(plots)
Warning: CombinePlots is being deprecated. Plots should now be combined using the patchwork system.
Both the UMAP and tSNE show that the DTA and the Control are separated. I will now perform clustering as normal, then I will follow this up by integrating the data with the previous data from the Science paper to show how these clusters relate to the original papers clusters.
Now we plot show expression of some common genes found previously to be stable clusters within the mouse OLs, just for reference.
DefaultAssay(oligos.integrated) <- "SCT"
# Normalize RNA data for visualization purposes
#oligos.integrated <- NormalizeData(oligos.integrated, verbose = FALSE)
#FeaturePlot(oligos.integrated, c("Tmsb4x","Tpt1","Fth1","Enpp2","App"),pt.size = 0.1)
FeaturePlot(oligos.integrated, c("Pdgfra", "Ptprz1","Bmp4","Itpr2", "Egr1", "Klk6", "Hopx", "Ptgds","Il33"),pt.size = 0.1)
DefaultAssay(oligos.integrated) <- "SCT"
As you can see, most of the OL cluster seems to be Ptgds/Il33 postitive instead of Hopx/Klk6 meaning that we might have mostly MOL5/6 of the original cluster.
I show the clusters on the UMAP so you can see their position.
oligos.integrated <- FindNeighbors(oligos.integrated, dims = 1:30,nn.method="rann")
Computing nearest neighbor graph
Computing SNN
oligos.integrated <- FindClusters(oligos.integrated,algorithm = 4,resolution = 0.6)
DimPlot(oligos.integrated, group.by = c("seurat_clusters"), combine = FALSE)
[[1]]
DimPlot(oligos.integrated, group.by = c("Sample"), combine = FALSE)
[[1]]
oligos.integrated$seurat_clusters_rn <- factor(oligos.integrated$seurat_clusters, levels= c(
"4",
"11",
"10",
"9",
"7",
"6",
"8",
"2",
"1",
"5",
"3"))
library(plyr)
oligos.integrated$seurat_clusters_rn <- revalue(as.factor(oligos.integrated$seurat_clusters_rn), c("4"="OPC","11"="OPC cycling","10"="COP/NFOL/MFOLa","9"="MFOLb","7"="MOL1","6"="MOL2a","8"="MOL2b","2"="MOL5/6a","1"="MOL5/6b","5"="MOL5/6c","3"="MOL5/6d"))
DimPlot(oligos.integrated, group.by = c("seurat_clusters_rn"), combine = FALSE,label=T)
[[1]]
DimPlot(oligos.integrated, group.by = c("seurat_clusters"), combine = FALSE)
[[1]]
oligos.integrated.markers %>% group_by(cluster) %>% top_n(n = 2, wt = avg_logFC)
Below follows the heatmap showing the top 10 genes based on fold change for each cluster.
DefaultAssay(oligos.integrated) <- "SCT"
Idents(oligos.integrated) <- oligos.integrated@meta.data$seurat_clusters_rn
top10 <- oligos.integrated.markers %>% group_by(cluster) %>% top_n(n = 10, wt = avg_logFC)
DoHeatmap(oligos.integrated, features = top10$gene) + NoLegend()
Idents(oligos.integrated) <- oligos.integrated@meta.data$seurat_clusters
And here are the top 2 genes found for each cluster as shown on the UMAP.
DefaultAssay(oligos.integrated) <- "SCT"
# Normalize RNA data for visualization purposes
#oligos.integrated <- NormalizeData(oligos.integrated, verbose = FALSE)
top2 <- oligos.integrated.markers %>% group_by(cluster) %>% top_n(n = 2, wt = avg_logFC)
FeaturePlot(oligos.integrated, features = top2$gene,pt.size = 0.1)
DefaultAssay(oligos.integrated) <- "SCT"
Here I show a table with the DTA relating to the Seurat found clusters. TRUE means Ablated
condition <- oligos.integrated@meta.data$Sample
condition <- strsplit(condition,"_")
Ablated <- as.logical()
for (i in 1:length(condition)) {
Ablated <- c(Ablated,"GD"==condition[[i]][1])
}
condition <- Ablated
Ablated <- as.data.frame(Ablated)
row.names(Ablated) <- colnames(oligos.integrated)
oligos.integrated <- AddMetaData(oligos.integrated,Ablated)
table(oligos.integrated$seurat_clusters,oligos.integrated$Ablated)
Now we attempt to transfer the cluster labels of the Science dataset onto the 10X dataset.
DimPlot(oligos.integrated, group.by = c("seurat_clusters"), combine = FALSE)
DimPlot(oligos.integrated, group.by = c("predicted.id"), combine = FALSE)
DimPlot(oligos.integrated, group.by = c("Sample"), combine = FALSE)
As you can see the Science paper clusters are showing what we could determine with the markers as well, most if not all of the DTA effect of the MOLs is located into a single OL population - MOL5. OPCs ofcourse also have a DTA effect.
table(oligos.integrated$predicted.id,oligos.integrated$Ablated)
data <- as.data.frame(table(oligos.integrated$Sample,oligos.integrated$predicted.id))
colnames(data) <- c("Condition","Cluster","Freq")
library(plyr)
data$Cluster <- factor(data$Cluster,levels=c("OPC","COP","NFOL1","NFOL2","MFOL2","MOL1","MOL2","MOL3","MOL4","MOL5","MOL6"))
#data$Cluster <- revalue(as.factor(data$Cluster),c("PPR"="VLMC"))
# Stacked + percent
ggplot(data, aes(fill=Condition, y=Freq, x=Cluster)) +
geom_bar(position="fill", stat="identity")
ggplot(data, aes(fill=Condition, y=Freq, x=Cluster)) +
geom_bar( stat="identity") + scale_y_log10()
data <- as.data.frame(table(oligos.integrated$Sample,oligos.integrated$seurat_clusters))
colnames(data) <- c("Condition","Cluster","Freq")
library(plyr)
#data$Cluster <- factor(data$Cluster,levels=c("OPC","COP","NFOL1","NFOL2","MFOL2","MOL1","MOL2","MOL3","MOL4","MOL5","MOL6"))
#data$Cluster <- revalue(as.factor(data$Cluster),c("PPR"="VLMC"))
# Stacked + percent
ggplot(data, aes(fill=Condition, y=Freq, x=Cluster)) +
geom_bar(position="fill", stat="identity")
ggplot(data, aes(fill=Condition, y=Freq, x=Cluster)) +
geom_bar( stat="identity")
data <- as.data.frame(table(oligos.integrated$Sample,oligos.integrated$predicted.id))
colnames(data) <- c("Condition","Cluster","Freq")
library(plyr)
data$Cluster <- factor(data$Cluster,levels=c("OPC","COP","NFOL1","NFOL2","MFOL2","MOL1","MOL2","MOL3","MOL4","MOL5","MOL6"))
library(reshape2)
datacasted <- dcast(data,Cluster ~ Condition)
calc_cpm <-function (expr_mat)
{
norm_factor <- colSums(expr_mat)
return(t(t(expr_mat)/norm_factor)) * 10^50
}
datacasted[,2:4] <- calc_cpm(datacasted[,2:4])
data <- melt(datacasted)
colnames(data) <- c("Condition","Cluster","Freq")
#data$Cluster <- revalue(as.factor(data$Cluster),c("PPR"="VLMC"))
# Stacked + percent
ggplot(data, aes(fill=Cluster, y=Freq, x=Condition)) +
geom_bar(position="fill", stat="identity")
ggplot(data, aes(fill=Cluster, y=Freq, x=Condition)) +
geom_bar( stat="identity")
library(heatmap3)
library(viridis)
comparison <-scale(t(scale(table(oligos.integrated$Sample,oligos.integrated$seurat_clusters))))
heatmap3(comparison[rev(row.names(comparison)),], Rowv = NA , Colv = NA ,scale = "none",symm = F, method = "ward.D2",col=colorRampPalette(c("limegreen","black",
"firebrick3"))(1024),balanceColor =T,cexRow = 2,cexCol = 2,margins = c(10, 10))
heatmap3(comparison[rev(row.names(comparison)),], Rowv = NA , Colv = NA ,scale = "none",symm = F, method = "ward.D2",col=viridis(1000),balanceColor =T,cexRow = 2,cexCol = 2,margins = c(10, 10))
library(heatmap3)
library(viridis)
ClustersScience <- factor(oligos.integrated$predicted.id,levels=c("OPC","COP","NFOL1","NFOL2","MFOL2","MOL1","MOL2","MOL3","MOL4","MOL5","MOL6"))
comparison <-scale(t(scale(table(oligos.integrated$Sample,ClustersScience))))
heatmap3(comparison[rev(row.names(comparison)),], Rowv = NA , Colv = NA ,scale = "none",symm = F, method = "ward.D2",col=colorRampPalette(c("limegreen","black",
"firebrick3"))(1024),balanceColor =T,cexRow = 2,cexCol = 2,margins = c(10, 10))
heatmap3(comparison[rev(row.names(comparison)),], Rowv = NA , Colv = NA ,scale = "none",symm = F, method = "ward.D2",col=viridis(1000),balanceColor =F,cexRow = 2,cexCol = 2,margins = c(10, 10))
library(heatmap3)
library(viridis)
ClustersScience <- factor(oligos.integrated$predicted.id,levels=c("OPC","COP","NFOL1","NFOL2","MFOL2","MOL1","MOL2","MOL3","MOL4","MOL5","MOL6"))
comparison <-t(scale(t(scale(table(ClustersScience,oligos.integrated$Sample)))))
heatmap3(comparison[rev(row.names(comparison)),], Rowv = NA , Colv = NA ,scale = "none",symm = F, method = "ward.D2",col=colorRampPalette(c("limegreen","black",
"firebrick3"))(1024),balanceColor =T,cexRow = 2,cexCol = 2,margins = c(10, 10))
heatmap3(comparison[rev(row.names(comparison)),], Rowv = NA , Colv = NA ,scale = "none",symm = F, method = "ward.D2",col=viridis(1000),balanceColor =F,cexRow = 2,cexCol = 2,margins = c(10, 10))
data <- as.data.frame(table(oligos.integrated$Sample,oligos.integrated$predicted.id))
colnames(data) <- c("Condition","Cluster","Freq")
library(plyr)
data$Cluster <- factor(data$Cluster,levels=c("OPC","COP","NFOL1","NFOL2","MFOL2","MOL1","MOL2","MOL3","MOL4","MOL5","MOL6","PPR"))
library(reshape2)
datacasted <- dcast(data,Cluster ~ Condition)
calc_cpm <-function (expr_mat)
{
norm_factor <- colSums(expr_mat)
return(t(t(expr_mat)/norm_factor))
}
datacasted[,2:4] <- calc_cpm(datacasted[,2:4])*100
row.names(datacasted) <- datacasted[,1]
datacasted <- datacasted[,2:4]
comparison <-t(scale(t(datacasted)))
heatmap3(comparison[rev(row.names(comparison)),], Rowv = NA , Colv = NA ,scale = "none",symm = F, method = "ward.D2",col=viridis(1000),balanceColor =F,cexRow = 2,cexCol = 2,margins = c(10, 10))
comparison <-datacasted-apply(datacasted,1,function(x) mean(x))
heatmap3(comparison[rev(row.names(comparison)),], Rowv = NA , Colv = NA ,scale = "none",symm = F, method = "ward.D2",col=viridis(1000),balanceColor =F,cexRow = 2,cexCol = 2,margins = c(10, 10))
heatmap3(comparison[rev(row.names(comparison)),], Rowv = NA , Colv = NA ,scale = "none",symm = F, method = "ward.D2",col=colorRampPalette(c("limegreen","black",
"firebrick3"))(1024),balanceColor =T,cexRow = 2,cexCol = 2,margins = c(10, 10))
data <- as.data.frame(table(oligos.integrated$Sample,oligos.integrated$seurat_clusters_rn))
colnames(data) <- c("Condition","Cluster","Freq")
library(plyr)
data$Cluster <- factor(data$Cluster,levels=c("OPC","COP","NFOL1","NFOL2","MFOL2","MOL1","MOL2","MOL3","MOL4","MOL5","MOL6","PPR"))
library(reshape2)
datacasted <- dcast(data,Cluster ~ Condition)
calc_cpm <-function (expr_mat)
{
norm_factor <- colSums(expr_mat)
return(t(t(expr_mat)/norm_factor))
}
datacasted[,2:4] <- calc_cpm(datacasted[,2:4])*100
row.names(datacasted) <- datacasted[,1]
datacasted <- datacasted[,2:4]
datamelted <- melt(t(datacasted))
datamelted$Var2 <- as.factor(datamelted$Var2)
ggplot(datamelted, aes(y = value, x = Var2)) + # Move y and x here so than they can be used in stat_*
geom_dotplot(aes(fill = Var1), # Use fill = Species here not in ggplot()
binaxis = "y", # which axis to bin along
binwidth = 1.25, # Minimal difference considered diffeerent
stackdir = "center",
position = position_jitter(0.2)# Centered
) + scale_fill_manual(values=c("#70BF45","#5675D6","#C9502B"))+# scale_y_log10() +
stat_summary(fun.y = mean, fun.ymin = mean, fun.ymax = mean,
geom = "crossbar", width = 0.5,fatten = 0.01) + theme(axis.text.x = element_text(angle = 45))
To establish what genes are shifted/upregulated/downregulated, I perform a tour de force (just pressing a button) to calculate for each individual population (not including clusters with very few ablated cells)
OPC COP MFOL2 MOL1 MOL2 MOL3 MOL4 MOL5 MOL6
I employ adjusted p-values, so low numbers of cells will not show significance, especially as low effect sizes in those populations.
The dashed line indicates the threshold for p<0.01
library(ggrepel)
SetIdent(oligos.integrated,value=oligos.integrated@meta.data$predicted.id)
Idents(oligos.integrated) <- oligos.integrated@meta.data$predicted.id
oligos.integrated$cluster <- oligos.integrated$predicted.id
oligos.integrated$celltype.sample <- paste(Idents(oligos.integrated), oligos.integrated$Ablated, sep = "_")
oligos.integrated$celltype <- Idents(oligos.integrated)
Idents(oligos.integrated) <- "celltype.sample"
table(Idents(oligos.integrated))
DefaultAssay(oligos.integrated) <- "SCT"
oligos.integrated.samplediffOPCRNA <- FindMarkers(oligos.integrated, ident.1 = "OPC_TRUE", ident.2 = "OPC_FALSE", verbose = FALSE,logfc.threshold = 0.1,min.pct=0)
#head(oligos.integrated.samplediffOPCRNA, n = 50)
diffmatrix <- oligos.integrated.samplediffOPCRNA
diffmatrix$logp_val <- -log10(diffmatrix$p_val_adj+1e-300)
ggplot(diffmatrix,aes(avg_logFC,y=logp_val,label=row.names(diffmatrix)))+ geom_point(size=0.2)+ geom_text_repel(data=subset(diffmatrix, p_val_adj < 1e-30 & abs(avg_logFC) > 0), label=row.names(subset(diffmatrix, p_val_adj < 1e-30 & abs(avg_logFC) > 0)))+xlab("log2_FC") + ylab("-log10_p-value_adj") + geom_hline(yintercept=-log10(0.01),linetype="dashed",size=0.5)
oligos.integrated.samplediffCOPRNA <- FindMarkers(oligos.integrated, ident.1 = "COP_TRUE", ident.2 = "COP_FALSE", verbose = FALSE)
#head(oligos.integrated.samplediffCOPRNA, n = 50)
diffmatrix <- oligos.integrated.samplediffCOPRNA
diffmatrix$logp_val <- -log10(diffmatrix$p_val_adj)
ggplot(diffmatrix,aes(avg_logFC,y=logp_val,label=row.names(diffmatrix)))+ geom_point()+ geom_text_repel(data=subset(diffmatrix, p_val_adj < 0.01 & abs(avg_logFC) > 0.3), label=row.names(subset(diffmatrix, p_val_adj < 0.01 & abs(avg_logFC) > 0.3)))+xlab("log2_FC") + ylab("-log10_p-value_adj") + geom_hline(yintercept=-log10(0.01),linetype="dashed",size=0.5)
oligos.integrated.samplediffMFOL2RNA <- FindMarkers(oligos.integrated, ident.1 = "MFOL2_TRUE", ident.2 = "MFOL2_FALSE", verbose = FALSE)
#head(oligos.integrated.samplediffMFOL2RNA, n = 50)
diffmatrix <- oligos.integrated.samplediffMFOL2RNA
diffmatrix$logp_val <- -log10(diffmatrix$p_val_adj)
ggplot(diffmatrix,aes(avg_logFC,y=logp_val,label=row.names(diffmatrix)))+ geom_point()+ geom_text_repel(data=subset(diffmatrix, p_val_adj < 0.01 & abs(avg_logFC) > 0.3), label=row.names(subset(diffmatrix, p_val_adj < 0.01 & abs(avg_logFC) > 0.3)))+xlab("log2_FC") + ylab("-log10_p-value_adj") + geom_hline(yintercept=-log10(0.01),linetype="dashed",size=0.5)
oligos.integrated.samplediffMOL1RNA <- FindMarkers(oligos.integrated, ident.1 = "MOL1_TRUE", ident.2 = "MOL1_FALSE", verbose = FALSE)
#head(oligos.integrated.samplediffMOL1RNA, n = 50)
diffmatrix <- oligos.integrated.samplediffMOL1RNA
diffmatrix$logp_val <- -log10(diffmatrix$p_val_adj)
ggplot(diffmatrix,aes(avg_logFC,y=logp_val,label=row.names(diffmatrix)))+ geom_point()+ geom_text_repel(data=subset(diffmatrix, p_val_adj < 0.01 & abs(avg_logFC) > 0.3), label=row.names(subset(diffmatrix, p_val_adj < 0.01 & abs(avg_logFC) > 0.3)))+xlab("log2_FC") + ylab("-log10_p-value_adj") + geom_hline(yintercept=-log10(0.01),linetype="dashed",size=0.5)
oligos.integrated.samplediffMOL2RNA <- FindMarkers(oligos.integrated, ident.1 = "MOL2_TRUE", ident.2 = "MOL2_FALSE", verbose = FALSE)
#head(oligos.integrated.samplediffMOL2RNA, n = 50)
diffmatrix <- oligos.integrated.samplediffMOL2RNA
diffmatrix$logp_val <- -log10(diffmatrix$p_val_adj)
ggplot(diffmatrix,aes(avg_logFC,y=logp_val,label=row.names(diffmatrix)))+ geom_point()+ geom_text_repel(data=subset(diffmatrix, p_val_adj < 0.01 & abs(avg_logFC) > 0.3), label=row.names(subset(diffmatrix, p_val_adj < 0.01 & abs(avg_logFC) > 0.3)))+xlab("log2_FC") + ylab("-log10_p-value_adj") + geom_hline(yintercept=-log10(0.01),linetype="dashed",size=0.5)
oligos.integrated.samplediffMOL3RNA <- FindMarkers(oligos.integrated, ident.1 = "MOL3_TRUE", ident.2 = "MOL3_FALSE", verbose = FALSE)
#head(oligos.integrated.samplediffMOL3RNA, n = 50)
diffmatrix <- oligos.integrated.samplediffMOL3RNA
diffmatrix$logp_val <- -log10(diffmatrix$p_val_adj)
ggplot(diffmatrix,aes(avg_logFC,y=logp_val,label=row.names(diffmatrix)))+ geom_point()+ geom_text_repel(data=subset(diffmatrix, p_val_adj < 0.01 & abs(avg_logFC) > 0.3), label=row.names(subset(diffmatrix, p_val_adj < 0.01 & abs(avg_logFC) > 0.3)))+xlab("log2_FC") + ylab("-log10_p-value_adj") + geom_hline(yintercept=-log10(0.01),linetype="dashed",size=0.5)
oligos.integrated.samplediffMOL4RNA <- FindMarkers(oligos.integrated, ident.1 = "MOL4_TRUE", ident.2 = "MOL4_FALSE", verbose = FALSE)
#head(oligos.integrated.samplediffMOL4RNA, n = 50)
diffmatrix <- oligos.integrated.samplediffMOL4RNA
diffmatrix$logp_val <- -log10(diffmatrix$p_val_adj)
ggplot(diffmatrix,aes(avg_logFC,y=logp_val,label=row.names(diffmatrix)))+ geom_point()+ geom_text_repel(data=subset(diffmatrix, p_val_adj < 0.01 & abs(avg_logFC) > 0.3), label=row.names(subset(diffmatrix, p_val_adj < 0.01 & abs(avg_logFC) > 0.3)))+xlab("log2_FC") + ylab("-log10_p-value_adj") + geom_hline(yintercept=-log10(0.01),linetype="dashed",size=0.5)
oligos.integrated.samplediffMOL5RNA <- FindMarkers(oligos.integrated, ident.1 = c("MOL5_TRUE","MOL6_TRUE"), ident.2 = c("MOL5_FALSE","MOL6_FALSE"), verbose = FALSE,logfc.threshold = 0.1,min.pct=0)
#head(oligos.integrated.samplediffMOL5RNA, n = 50)
diffmatrix <- oligos.integrated.samplediffMOL5RNA
diffmatrix$logp_val <- -log10(diffmatrix$p_val_adj)
ggplot(diffmatrix,aes(avg_logFC,y=logp_val,label=row.names(diffmatrix)))+ geom_point()+ geom_text_repel(data=subset(diffmatrix, p_val_adj < 0.01 & abs(avg_logFC) > 0.25), label=row.names(subset(diffmatrix, p_val_adj < 0.01 & abs(avg_logFC) > 0.25)))+xlab("log2_FC") + ylab("-log10_p-value_adj") + geom_hline(yintercept=-log10(0.01),linetype="dashed",size=0.5)
oligos.integrated.samplediffMOL6RNA <- FindMarkers(oligos.integrated, ident.1 = "MOL6_TRUE", ident.2 = "MOL6_FALSE", verbose = FALSE)
#head(oligos.integrated.samplediffMOL6RNA, n = 50)
diffmatrix <- oligos.integrated.samplediffMOL6RNA
diffmatrix$logp_val <- -log10(diffmatrix$p_val_adj)
ggplot(diffmatrix,aes(avg_logFC,y=logp_val,label=row.names(diffmatrix)))+ geom_point()+ geom_text_repel(data=subset(diffmatrix, p_val_adj < 0.01 & abs(avg_logFC) > 0.3), label=row.names(subset(diffmatrix, p_val_adj < 0.01 & abs(avg_logFC) > 0.3)))+xlab("log2_FC") + ylab("-log10_p-value_adj") + geom_hline(yintercept=-log10(0.01),linetype="dashed",size=0.5)
Idents(oligos.integrated) <- "Ablated"
oligos.integrated.samplediffAllRNA <- FindMarkers(oligos.integrated, ident.1 = "TRUE", ident.2 = "FALSE", verbose = FALSE)
#head(oligos.integrated.samplediffAllRNA, n = 50)
diffmatrix <- oligos.integrated.samplediffAllRNA
diffmatrix$logp_val <- -log10(diffmatrix$p_val_adj)
ggplot(diffmatrix,aes(avg_logFC,y=logp_val,label=row.names(diffmatrix)))+ geom_point()+ geom_text_repel(data=subset(diffmatrix, p_val_adj < 0.01 & abs(avg_logFC) > 0.35), label=row.names(subset(diffmatrix, p_val_adj < 0.01 & abs(avg_logFC) > 0.35)))+xlab("log2_FC") + ylab("-log10_p-value_adj") + geom_hline(yintercept=-log10(0.01),linetype="dashed",size=0.5)
#venndiagram
library(VennDiagram)
As you can see from the plots, MOL5 takes the lions share of the DTA effect, which seems to be concentrated in the MOL5 and MOL3 clusters. MOL3 is interestingly predicted in cells very close to the DTA clusters in the UMAP.
The other major part of the DTA effect is located in the OPC cluster.
Finally, the last plot is a differential expression result of all the cells, and as you can see MOLs share many genes in the effect, but OPC shows slightly different genes.
To illustrate these effects, I will calculate the top 20 genes of the DTA effect across all cells and see how the major differentially expressed genes translate to the clusters/UMAP position.
top5 <- row.names(head(oligos.integrated.samplediffAllRNA,20))
DefaultAssay(oligos.integrated) <- "SCT"
# Normalize RNA data for visualization purposes
#oligos.integrated <- NormalizeData(oligos.integrated, verbose = FALSE)
top5 <- row.names(head(oligos.integrated.samplediffAllRNA,20))
FeaturePlot(oligos.integrated, features = top5,pt.size = 0.1,ncol = 2)
DefaultAssay(oligos.integrated) <- "SCT"
And here are violinplots of the same genes but now organised per cluster and ablation condition. Ablated = TRUE
DefaultAssay(oligos.integrated) <- "SCT"
# Normalize RNA data for visualization purposes
#oligos.integrated <- NormalizeData(oligos.integrated, verbose = FALSE)
# oligosintegrated.list <- SplitObject(oligos.integrated, split.by = "Ablated")
# oligos.integrated.cc <- merge(oligosintegrated.list[["FALSE"]], y = oligosintegrated.list[["TRUE"]], merge.data = TRUE)
plots <- VlnPlot(oligos.integrated, features = top5, split.by = "Ablated", group.by = "celltype",
pt.size = 0, combine = FALSE)
CombinePlots(plots = plots, ncol = 1)
#rm(oligosintegrated.list)
DefaultAssay(oligos.integrated) <- "SCT"
Now that this is done, it’s clear the OLs and OPCs have some differences regarding to the genes affected by ablation. Now, to get any sort of meaningfull insight into the DTA effect, it seems logical to separate the effect by cluster to avoid MOL-state expression to influence the analysis.
If you remember MOL5 is the most affected of the MOLs or in any case the most present in the dataset. Furthermore, the DTA effect seemed to be comparable between the OL clusters, therefore to focus completely on the DTA effect I will only focus on MOL5, because it has many differentially expressed genes.
Here I plot the clusters again for reference.
DimPlot(oligos.integrated, group.by = c("Sample"), combine = FALSE)
DimPlot(oligos.integrated, group.by = c("seurat_clusters"), combine = FALSE)
DimPlot(oligos.integrated, group.by = c("predicted.id"), combine = FALSE)
If you remember from the Seurat clustering, MOL5 is subclustered into 4 clusters. For the analysis to proceed I will make the assumption that we have healthy and affected states of MOL5 captured in the Seurat clusters.
Furthermore, it seems that we have two distinct DTA states reached and that we can divide MOL5 into two sides (as the UMAP is suggesting). Here I plot 3 genes that show that MOL5 has heterogenous expression within the clusters broadly dividing the cluster in two (seemingly regardless of DTA condition).
So, therefore I make a second assumption that Seurat clusters 2 and 1 represent a pair of MOL5 subclusters that represent the closest states to each other, most likely representing two sides of the same coin, and Seurat clusters 3 and 5 represent another pair of clusters closely related to each other.
Therefore it would make sense to compare Seurat cluster 2 vs 1 and 3 vs 5 and treat them as 2 diffent biological processes that are altered following ablation.
Here I plot the genes showing MOL5 can be divided into roughly two clusters.
DefaultAssay(oligos.integrated) <- "SCT"
# Normalize RNA data for visualization purposes
#oligos.integrated <- NormalizeData(oligos.integrated, verbose = FALSE)
FeaturePlot(oligos.integrated, c("Apod", "Tmsb4x","Tppp3"),pt.size = 1)
DefaultAssay(oligos.integrated) <- "SCT"
Here I intersect the results of a couple of clusters to get some shared and non-shared genes differentially expressed regarding the ablation. Showing adjusted p-values.
The take-away should be the first two plots,
MOL5 specific ablation effect, compared to OPC
OPC specific ablation effect, compared to MOL5
Then we have
MOL5 shared with OPC
OPC shared with MOL5
MOL5 shared with MOL3
OPC_diff <- subset(oligos.integrated.samplediffOPCRNA, p_val_adj < 0.01 & abs(avg_logFC) > 0.25)
MOL5_diff <- subset(oligos.integrated.samplediffMOL5RNA, p_val_adj < 0.01 & abs(avg_logFC) > 0.25)
MOL4_diff <- subset(oligos.integrated.samplediffMOL4RNA, p_val_adj < 0.01 & abs(avg_logFC) > 0.25)
MOL3_diff <- subset(oligos.integrated.samplediffMOL3RNA, p_val_adj < 0.01 & abs(avg_logFC) > 0.25)
OPC_MOL5shared <- intersect(row.names(OPC_diff),row.names(MOL5_diff))
OPC_MOL5nonshared <- setdiff(row.names(OPC_diff),row.names(MOL5_diff))
OPC_onlycomparedtoMOL5<- row.names(OPC_diff)[!row.names(OPC_diff) %in% row.names(MOL5_diff)]
MOL5_onlycomparedtoOPC<- row.names(MOL5_diff)[!row.names(MOL5_diff) %in% row.names(OPC_diff)]
MOL5_4shared <- intersect(row.names(MOL4_diff),row.names(MOL5_diff))
MOL5_3shared<- intersect(row.names(MOL3_diff),row.names(MOL5_diff))
MOL5_4_3shared <- intersect(row.names(MOL3_diff),intersect(row.names(MOL4_diff),row.names(MOL5_diff)))
OPC_MOL5shared
OPC_onlycomparedtoMOL5
MOL5_onlycomparedtoOPC
View(c(MOL5_onlycomparedtoOPC,OPC_MOL5shared))
View(c(OPC_onlycomparedtoMOL5,OPC_MOL5shared))
View(c(MOL5_4shared,OPC_MOL5shared))
diffmatrix <- MOL5_diff[MOL5_onlycomparedtoOPC,]
diffmatrix$logp_val <- -log10(diffmatrix$p_val_adj)
ggplot(diffmatrix,aes(avg_logFC,y=logp_val,label=row.names(diffmatrix)))+ geom_point()+ geom_text_repel(data=subset(diffmatrix, p_val_adj < 0.01 & abs(avg_logFC) > 0.25), label=row.names(subset(diffmatrix, p_val_adj < 0.01 & abs(avg_logFC) > 0.25)))+xlab("log2_FC") + ylab("-log10_p-value") + geom_hline(yintercept=-log10(0.01),linetype="dashed",size=0.5)
diffmatrix <- OPC_diff[OPC_onlycomparedtoMOL5,]
diffmatrix$logp_val <- -log10(diffmatrix$p_val_adj)
ggplot(diffmatrix,aes(avg_logFC,y=logp_val,label=row.names(diffmatrix)))+ geom_point()+ geom_text_repel(data=subset(diffmatrix, p_val_adj < 0.01 & abs(avg_logFC) > 0.25), label=row.names(subset(diffmatrix, p_val_adj < 0.01 & abs(avg_logFC) > 0.25)))+xlab("log2_FC") + ylab("-log10_p-value") + geom_hline(yintercept=-log10(0.01),linetype="dashed",size=0.5)
diffmatrix <- MOL5_diff[OPC_MOL5shared,]
diffmatrix$logp_val <- -log10(diffmatrix$p_val_adj)
ggplot(diffmatrix,aes(avg_logFC,y=logp_val,label=row.names(diffmatrix)))+ geom_point()+ geom_text_repel(data=subset(diffmatrix, p_val_adj < 0.01 & abs(avg_logFC) > 0.25), label=row.names(subset(diffmatrix, p_val_adj < 0.01 & abs(avg_logFC) > 0.25)))+xlab("log2_FC") + ylab("-log10_p-value") + geom_hline(yintercept=-log10(0.01),linetype="dashed",size=0.5)
diffmatrix <- OPC_diff[OPC_MOL5shared,]
diffmatrix$logp_val <- -log10(diffmatrix$p_val_adj)
ggplot(diffmatrix,aes(avg_logFC,y=logp_val,label=row.names(diffmatrix)))+ geom_point()+ geom_text_repel(data=subset(diffmatrix, p_val_adj < 0.01 & abs(avg_logFC) > 0.25), label=row.names(subset(diffmatrix, p_val_adj < 0.01 & abs(avg_logFC) > 0.25)))+xlab("log2_FC") + ylab("-log10_p-value") + geom_hline(yintercept=-log10(0.01),linetype="dashed",size=0.5)
diffmatrix <- MOL5_diff[MOL5_3shared,]
diffmatrix$logp_val <- -log10(diffmatrix$p_val_adj)
ggplot(diffmatrix,aes(avg_logFC,y=logp_val,label=row.names(diffmatrix)))+ geom_point()+ geom_text_repel(data=subset(diffmatrix, p_val_adj < 0.01 & abs(avg_logFC) > 0.25), label=row.names(subset(diffmatrix, p_val_adj < 0.01 & abs(avg_logFC) > 0.25)))+xlab("log2_FC") + ylab("-log10_p-value") + geom_hline(yintercept=-log10(0.01),linetype="dashed",size=0.5)
I made the second assumption that Seurat clusters 2 and 1 represent a pair of MOL5 subclusters that represent the closest states to each other, most likely representing two sides of the same coin, and Seurat clusters 3 and 5 represent another pair of clusters closely related to each other.
Therefore it would make sense to compare Seurat cluster 2 vs 1 and 3 vs 5 and treat them as 2 diffent biological processes that are altered following ablation. Which I do here:
library(ggrepel)
SetIdent(oligos.integrated,value=oligos.integrated@meta.data$seurat_clusters)
Idents(oligos.integrated) <- oligos.integrated@meta.data$seurat_clusters
oligos.integrated$cluster <- oligos.integrated$seurat_clusters
oligos.integrated$celltype.sample <- paste(Idents(oligos.integrated), oligos.integrated$Ablated, sep = "_")
oligos.integrated$celltype <- Idents(oligos.integrated)
#Idents(oligos.integrated) <- "celltype.sample"
#table(Idents(oligos.integrated))
DefaultAssay(oligos.integrated) <- "SCT"
oligos.integrated.samplediffDTA2vs1RNA <- FindMarkers(oligos.integrated, ident.1 = "2", ident.2 = c("1","5"), verbose = FALSE,logfc.threshold = 0,min.pct=0)
#head(oligos.integrated.samplediffDTA2vs1RNA, n = 50)
diffmatrix <- oligos.integrated.samplediffDTA2vs1RNA
diffmatrix$log_p_val <- -log10(diffmatrix$p_val_adj)
ggplot(diffmatrix,aes(avg_logFC,y=log_p_val,label=row.names(diffmatrix)))+ geom_point()+ geom_text_repel(data=subset(diffmatrix, log_p_val > 100 & abs(avg_logFC) > 0.25), label=row.names(subset(diffmatrix, log_p_val > 100 & abs(avg_logFC) > 0.25)))+xlab("log2_FC") + ylab("-log10_p-value") + geom_hline(yintercept=-log10(0.01),linetype="dashed",size=0.8)
#View(subset(oligos.integrated.samplediffDTA2vs1RNA, p_val_adj < 0.01 & abs(avg_logFC) > 0.25))
oligos.integrated.samplediffDTA3vs5RNA <- FindMarkers(oligos.integrated, ident.1 = "3", ident.2 = c("1","5"), verbose = FALSE,logfc.threshold = 0,min.pct=0)
#head(oligos.integrated.samplediffDTA3vs5RNA, n = 50)
diffmatrix <- oligos.integrated.samplediffDTA3vs5RNA
diffmatrix$log_p_val <- -log10(diffmatrix$p_val_adj)
ggplot(diffmatrix,aes(avg_logFC,y=log_p_val,label=row.names(diffmatrix)))+ geom_point()+ geom_text_repel(data=subset(diffmatrix, p_val_adj < 0.01 & abs(avg_logFC) > 0.45), label=row.names(subset(diffmatrix, p_val_adj < 0.01 & abs(avg_logFC) > 0.45)))+xlab("log2_FC") + ylab("-log10_p-value") + geom_hline(yintercept=-log10(0.01),linetype="dashed",size=0.8)
oligos.integrated.samplediffDTA2vs3RNA <- FindMarkers(oligos.integrated, ident.1 = "2", ident.2 = "3", verbose = FALSE,logfc.threshold = 0.25,min.pct=0.1)
#head(oligos.integrated.samplediffDTA3vs5RNA, n = 50)
diffmatrix <- oligos.integrated.samplediffDTA2vs3RNA
diffmatrix$log_p_val <- -log10(diffmatrix$p_val_adj+1e-300)
ggplot(diffmatrix,aes(avg_logFC,y=log_p_val,label=row.names(diffmatrix)))+ geom_point()+ geom_text_repel(data=subset(diffmatrix, p_val_adj < 1e-200 & abs(avg_logFC) > 0), label=row.names(subset(diffmatrix, p_val_adj < 1e-200 & abs(avg_logFC) > 0)))+xlab("log2_FC") + ylab("-log10_p-value") + geom_hline(yintercept=-log10(0.01),linetype="dashed",size=0.8)
oligos.integrated.samplediffDTA23vs15RNA <- FindMarkers(oligos.integrated, ident.1 = c("2","3"), ident.2 = c("1","5"), verbose = FALSE,logfc.threshold = 0,min.pct=0)
#head(oligos.integrated.samplediffDTA3vs5RNA, n = 50)
diffmatrix <- oligos.integrated.samplediffDTA23vs15RNA
diffmatrix$log_p_val <- -log10(diffmatrix$p_val_adj+1e-300)
ggplot(diffmatrix,aes(avg_logFC,y=log_p_val,label=row.names(diffmatrix)))+ geom_point(size=0.2)+ geom_text_repel(data=subset(diffmatrix, p_val_adj < 1e-100 & abs(avg_logFC) > 0), label=row.names(subset(diffmatrix, p_val_adj < 1e-100 & abs(avg_logFC) > 0)))+xlab("log2_FC") + ylab("-log10_p-value") + geom_hline(yintercept=-log10(0.01),linetype="dashed",size=0.8)
#View(subset(oligos.integrated.samplediffDTA3vs5RNA, p_val_adj < 0.01 & abs(avg_logFC) > 0.25))
DTA_diff <- subset(oligos.integrated.samplediffDTA23vs15RNA, p_val_adj < 0.01 & abs(avg_logFC) > 0.25)
DTA_diff_UL <- subset(oligos.integrated.samplediffDTA2vs3RNA,row.names(oligos.integrated.samplediffDTA2vs3RNA) %in% row.names(DTA_diff))
diffmatrix <- DTA_diff_UL
diffmatrix$log_p_val <- -log10(diffmatrix$p_val_adj+1e-300)
ggplot(diffmatrix,aes(avg_logFC,y=log_p_val,label=row.names(diffmatrix)))+ geom_point(size=0.2)+ geom_text_repel(data=subset(diffmatrix, p_val_adj < 1e-30 & abs(avg_logFC) > 0), label=row.names(subset(diffmatrix, p_val_adj < 1e-30 & abs(avg_logFC) > 0)))+xlab("log2_FC") + ylab("-log10_p-value") + geom_hline(yintercept=-log10(0.01),linetype="dashed",size=0.8)
Idents(oligos.integrated) <- "Ablated"
oligos.integrated.samplediffAllRNA <- FindMarkers(oligos.integrated, ident.1 = "TRUE", ident.2 = "FALSE", verbose = FALSE)
head(oligos.integrated.samplediffAllRNA, n = 50)
diffmatrix <- oligos.integrated.samplediffAllRNA
diffmatrix$log_p_val <- -log10(diffmatrix$p_val_adj)
ggplot(diffmatrix,aes(avg_logFC,y=log_p_val,label=row.names(diffmatrix)))+ geom_point()+ geom_text_repel(data=subset(diffmatrix, p_val_adj < 0.01 & abs(avg_logFC) > 0.25), label=row.names(subset(diffmatrix, p_val_adj < 0.01 & abs(avg_logFC) > 0.25)))+xlab("log2_FC") + ylab("-log10_p-value_adj") + geom_hline(yintercept=-log10(0.01),linetype="dashed",size=0.5)
DiffMatrix <- list()
diffmatrixnames <- c("oligos.integrated.samplediffDTA2vs3RNA",
"oligos.integrated.samplediffDTA23vs15RNA")
do.call(head,as.list(as.name(diffmatrixnames[1])))
library(clusterProfiler)
#Convert to gencode using biomart
library(biomaRt)
listMarts()
ensembl = useMart("ensembl",dataset="mmusculus_gene_ensembl")
listDatasets(ensembl)
attributes = listAttributes(ensembl)
Biomart_gencode_ensembl84_biotypes <- getBM(attributes=c("mgi_symbol","ensembl_gene_id","entrezgene_id","gene_biotype"), filters = "", values = "", ensembl)
Biomart_gencode_ensembl84_biotypes[, 'gene_biotype'] <- as.factor(Biomart_gencode_ensembl84_biotypes[,'gene_biotype'])
#Filter for only our genes
Biotype_All_dataset <- subset(Biomart_gencode_ensembl84_biotypes, mgi_symbol %in% oligos.integrated@assays$SCT@var.features)
entrezID <- subset(Biotype_All_dataset, Biotype_All_dataset$mgi_symbol %in% oligos.integrated@assays$SCT@var.features)
library(ReactomePA)
library(org.Mm.eg.db)
ReactomeTerms <- list()
i=1
#UP
pvaladj <- 0.01
logfc <- 0.2
for(i in 1:length(diffmatrixnames)){
diffmatrix <- do.call("as.data.frame",as.list(as.name(diffmatrixnames[i])))
diffmatrix <- subset(diffmatrix, p_val_adj < pvaladj & avg_logFC > logfc)
#siggenes <- head(row.names(diffmatrix),100)
siggenes <- row.names(diffmatrix)
entrezmatched <- entrezID[entrezID$mgi_symbol %in% siggenes,]
#entrezID <- entrezID[! apply(entrezID[,c(1,3)], 1,function (x) anyNA(x)),]
allLLIDs <- entrezmatched$entrezgene
modulesReactome <- enrichPathway(gene=allLLIDs,organism="mouse",pvalueCutoff=0.05,qvalueCutoff = 0.3,pAdjustMethod = "none", readable=T)
#modulesReactome <- enrichGO(gene=allLLIDs,"org.Mm.eg.db",pvalueCutoff=0.05,qvalueCutoff = 0.3,pAdjustMethod = "none", readable=T)
ReactomeTerms[[i]] <- modulesReactome
head(as.data.frame(modulesReactome))
print(i)
}
ReactomeTerms[which(lapply(ReactomeTerms,function(x) is.null(x))==TRUE)] <- "No_Genes"
#Add DOWN
pvaladj <- 0.01
logfc <- -0.25
offset <- length(ReactomeTerms)
for(i in 1:length(diffmatrixnames)){
i=i+offset
diffmatrix <- do.call("as.data.frame",as.list(as.name(diffmatrixnames[i-offset])))
diffmatrix <- subset(diffmatrix, p_val_adj < pvaladj & avg_logFC < logfc)
#siggenes <- head(row.names(diffmatrix),100)
siggenes <- row.names(diffmatrix)
entrezmatched <- entrezID[entrezID$mgi_symbol %in% siggenes,]
#entrezID <- entrezID[! apply(entrezID[,c(1,3)], 1,function (x) anyNA(x)),]
allLLIDs <- entrezmatched$entrezgene
modulesReactome <- enrichPathway(gene=allLLIDs,organism="mouse",pvalueCutoff=0.05,qvalueCutoff = 0.3,pAdjustMethod = "none", readable=T)
#modulesReactome <- enrichGO(gene=allLLIDs,"org.Mm.eg.db",pvalueCutoff=0.05,qvalueCutoff = 0.3,pAdjustMethod = "none", readable=T)
ReactomeTerms[[i]] <- modulesReactome
head(as.data.frame(modulesReactome))
print(i)
}
ReactomeTerms[which(lapply(ReactomeTerms,function(x) is.null(x))==TRUE)] <- "No_Genes"
Upper_diff <- subset(oligos.integrated.samplediffDTA2vs3RNA, p_val_adj < 0.01 & abs(avg_logFC) > 0)
Lower_diff <- subset(oligos.integrated.samplediffDTA23vs15RNA, p_val_adj < 0.01 & abs(avg_logFC) > 0)
AlldiffgenesHetMOL5 <- intersect(intersect(row.names(oligos.integrated.samplediffDTA2vs3RNA),row.names(oligos.integrated.samplediffDTA23vs15RNA)),unique(c(row.names(Upper_diff),row.names(Lower_diff))))
subset2 <- oligos.integrated.samplediffDTA2vs3RNA[AlldiffgenesHetMOL5,]
subset3 <- oligos.integrated.samplediffDTA23vs15RNA[AlldiffgenesHetMOL5,]
subsetMOL5 <- cbind(subset2,subset3)
colnames(subsetMOL5) <- make.unique(colnames(subsetMOL5))
diffmatrix <- subsetMOL5
diffmatrix$log_p_val <- -log10(diffmatrix$p_val_adj+(1e-300))
q95pgenes1 <- row.names(diffmatrix[which(diffmatrix$log_p_val >= quantile(diffmatrix$log_p_val,0)),])
diffmatrix$log_p_val.1 <- -log10(diffmatrix$p_val_adj.1+(1e-300))
q95pgenes2 <- row.names(diffmatrix[which(diffmatrix$log_p_val.1 >= quantile(diffmatrix$log_p_val.1,0)),])
q95pgenes <- unique(c(q95pgenes1,q95pgenes2))
diffmatrix <- diffmatrix[q95pgenes,]
diffmatrix$avg_logFC[is.infinite(diffmatrix$avg_logFC)] <- max(diffmatrix$avg_logFC[!is.infinite(diffmatrix$avg_logFC)])
diffmatrix$avg_logFC.1[is.infinite(diffmatrix$avg_logFC.1)] <- max(diffmatrix$avg_logFC.1[!is.infinite(diffmatrix$avg_logFC.1)])
#diffmatrix$avg_logFC.1 <- 2*diffmatrix$avg_logFC.1
diffmatrix$combp <- -log10(diffmatrix$p_val_adj*diffmatrix$p_val_adj.1)
diffmatrix$maxp <- apply(cbind(diffmatrix$log_p_val,diffmatrix$log_p_val.1),1,function(x) max(x))
diffmatrix$minp <- apply(cbind(diffmatrix$p_val_adj,diffmatrix$p_val_adj.1),1,function(x) min(x))
diffmatrix$maxp[is.infinite(diffmatrix$maxp)] <- max(diffmatrix$maxp[!is.infinite(diffmatrix$maxp)])
diffmatrix$maxFC <- apply(cbind(diffmatrix$avg_logFC,diffmatrix$avg_logFC.1),1,function(x) max(abs(x)))
diffmatrix$Genes <- factor(row.names(diffmatrix),levels=row.names(diffmatrix))
ggplot(diffmatrix,aes(avg_logFC,y=avg_logFC.1,colour=maxp,label=row.names(diffmatrix)))+ geom_point(size=diffmatrix$maxp/200) + scale_colour_viridis_c(direction = +1,option = "viridis") + geom_hline(yintercept= 0,linetype="dashed",size=0.1,color="white")+
geom_hline(yintercept= 0.25,linetype="dashed",size=0.1,color="white",alpha=0.5)+
geom_hline(yintercept= -0.25,linetype="dashed",size=0.1,color="white",alpha=0.5)+
geom_vline(xintercept= 0,linetype="dashed",size=0.1,color="white")+
geom_vline(xintercept= 0.25,linetype="dashed",size=0.1,color="white",alpha=0.5)+
geom_vline(xintercept= -0.25,linetype="dashed",size=0.1,color="white",alpha=0.5)+
geom_text_repel(size=2.5,fontface = "bold",force=1,data=subset(diffmatrix,
maxp > quantile(diffmatrix$maxp,0.995) |
avg_logFC > 0.4 |
avg_logFC < -1 |
avg_logFC.1 > 0.25 |
avg_logFC.1 < -0.35),label=row.names(subset(diffmatrix,
maxp > quantile(diffmatrix$maxp,0.995) |
avg_logFC > 0.4 |
avg_logFC < -1 |
avg_logFC.1 > 0.25 |
avg_logFC.1 < -0.35)
))+xlab("IS vs WD") + ylab("Other vs Control") +theme(
# get rid of panel grids
panel.grid.major = element_blank(),
#panel.grid.major = element_line(color="darkgrey",size=0.1),
panel.grid.minor = element_blank(),
#panel.grid.minor = element_line(color="darkgrey",size=0.05),
# Change plot and panel background
plot.background=element_rect(fill = "white"),
panel.background = element_rect(fill = 'black'),
# Change legend
legend.background = element_rect(fill = "white", color = NA),
legend.key = element_rect(color = "gray", fill = "white"),
legend.title = element_text(color = "Black"),
legend.text = element_text(color = "black")
)
#magma,inferno, plasma,viridis
#scale_colour_gradient(low = "darkgreen", high = "red")
#Do reactome analysis at the bottom of script
i=1
j=1
for(i in 1:length(ReactomeTerms)){
pwydata <- as.data.frame(ReactomeTerms[[i]])
geneset <- strsplit(pwydata$geneID, "/")
FCmeans <- data.frame()
for(j in 1:length(geneset)){
if(length(geneset)>0){
geneset2FC <- geneset[[j]]
geneset2FC[which(geneset2FC %in% c("ND2"))] <- "mt-Nd2"
geneset2FC[which(geneset2FC %in% c("ND3"))] <- "mt-Nd3"
geneset2FC[which(geneset2FC %in% c("ND5"))] <- "mt-Nd5"
geneset2FC <- which(row.names(diffmatrix) %in% geneset2FC)
FC <- mean(diffmatrix$avg_logFC[geneset2FC],na.rm=T)
FCvar <- var(diffmatrix$avg_logFC[geneset2FC],na.rm=T)
FC.1 <- mean(diffmatrix$avg_logFC.1[geneset2FC],na.rm=T)
FC.1var <- var(diffmatrix$avg_logFC.1[geneset2FC],na.rm=T)
FCmeans <- rbind(FCmeans,cbind(FC,FC.1,FCvar,FC.1var))
}
}
ReactomeTerms[[i]] <- cbind(ReactomeTerms[[i]],FCmeans)
}
pathmatrix <- rbind(as.data.frame(ReactomeTerms[[1]]),as.data.frame(ReactomeTerms[[2]]),as.data.frame(ReactomeTerms[[3]]),as.data.frame(ReactomeTerms[[4]]))
pathmatrix$p.adjust_original <- pathmatrix$p.adjust
pathmatrix$p.adjust <- -log10(pathmatrix$p.adjust )
pathmatrix$maxFC <- sum(abs(pathmatrix$FC),abs(pathmatrix$FC.1))
pathmatrix <- subset(pathmatrix, pathmatrix$Count > 1)
pathmatrix$AdjSelect <- pathmatrix$p.adjust*(1000*(0.2+abs(pathmatrix$FC.1)))
pathmatrix$neglogqvalue <- -log10(pathmatrix$qvalue)
pathmatrix2 <- pathmatrix[duplicated(pathmatrix$geneID),]
pathmatrix <- pathmatrix[!duplicated(pathmatrix$geneID),]
#pathmatrix <- rbind(pathmatrix,pathmatrix2[!duplicated(pathmatrix2$geneID),])
ggplot(pathmatrix,aes(FC,y=FC.1,colour=p.adjust_original),label=pathmatrix$Description)+ geom_point(size=pathmatrix$Count,alpha=0.5) +scale_colour_viridis_c(direction = +1,option = "viridis") +
geom_hline(yintercept= 0,linetype="solid",size=0.5,color="black",alpha=0.5)+
geom_hline(yintercept= 0.25,linetype="solid",size=0.2,color="black",alpha=0.5)+
geom_hline(yintercept= -0.25,linetype="solid",size=0.2,color="black",alpha=0.5)+
geom_vline(xintercept= 0,linetype="solid",size=0.5,color="black",alpha=0.5)+
geom_vline(xintercept= 0.25,linetype="solid",size=0.2,color="black",alpha=0.5)+
geom_vline(xintercept= -0.25,linetype="solid",size=0.2,color="black",alpha=0.5)+
geom_text_repel(size=2,fontface="bold",force=20,data=
subset(pathmatrix,
abs(pathmatrix$AdjSelect) > quantile(
abs(pathmatrix$AdjSelect),1,na.rm=T) | abs(pathmatrix$p.adjust) > quantile(
abs(pathmatrix$p.adjust),0.75,na.rm=T) |
abs(pathmatrix$FC.1) > quantile(abs(pathmatrix$FC.1),1,na.rm=T)),
label=subset(pathmatrix,
abs(pathmatrix$AdjSelect) > quantile(abs(pathmatrix$AdjSelect),1,na.rm=T) |
abs(pathmatrix$p.adjust) > quantile(abs(pathmatrix$p.adjust),0.75,na.rm=T) |
abs(pathmatrix$FC.1) > quantile(abs(pathmatrix$FC.1),1,na.rm=T))$Description,box.padding = 0.5)+xlab("IS vs WD") + ylab("Other vs Control")
# +theme(
# # get rid of panel grids
# panel.grid.major = element_blank(),
# panel.grid.minor = element_blank(),
# # Change plot and panel background
# plot.background=element_rect(fill = "white"),
# panel.background = element_rect(fill = 'black'),
# # Change legend
# legend.background = element_rect(fill = "white", color = NA),
# legend.key = element_rect(color = "gray", fill = "white"),
# legend.title = element_text(color = "Black"),
# legend.text = element_text(color = "black")
# )
#scale_colour_gradient(low = "yellow", high = "red") +
#scale_colour_viridis_c(direction = -1)
#scale_colour_gradient(low = "black", high = "red")
diffmatrix <- diffmatrix[row.names(Lower_diff),]
ggplot(diffmatrix,aes(avg_logFC.1,y=avg_logFC,color=avg_logFC,label=row.names(diffmatrix)))+ geom_point(size=1,alpha=1)+scale_colour_gradient2(low = "yellow",mid="black" ,high = "red")+ geom_text_repel(fontface="plain",data=subset(diffmatrix, p_val_adj.1 < 0.01 & abs(avg_logFC.1) > 0.25),label=row.names(subset(diffmatrix, p_val_adj.1 < 0.01 & abs(avg_logFC.1) > 0.25)))+xlab("log2_FC") + ylab("-log10_p-value") #+ geom_hline(yintercept=-log10(0.01),linetype="dashed",size=0.8)
ggplot(diffmatrix,aes(avg_logFC.1,y=log_p_val.1,color=avg_logFC,label=row.names(diffmatrix)))+ geom_point(size=1)+scale_colour_gradient2(low = "yellow",mid="black" ,high = "red")+ geom_text_repel(fontface="plain",data=subset(diffmatrix, p_val_adj.1 < 0.01 & abs(avg_logFC.1) > 0.25),label=row.names(subset(diffmatrix, p_val_adj.1 < 0.01 & abs(avg_logFC.1) > 0.25)))+xlab("log2_FC") + ylab("-log10_p-value")
FC <- diffmatrix$avg_logFC
names(FC) <- row.names(diffmatrix)
cnetplot(ReactomeTerms[[3]], showCategory = 3,categorySize="pvalue", foldChange=FC)
This will hopefully have given us two sets of differentially expressed genes, that should have minimal effect of the MOL-state effect, and should instead lay bare the DTA effect clearly.
By now you propably have seen recurring genes, very similar to the list of genes we already had before in the paper, below I will try to tease out what might be going wrong/is compensated in the ablated MOL5 cells. And with a bit of luck this reflects a general mechanism in the ablated population.
Here I make a distinction between “Upper” and “Lower”, simply referring to the MOL5 upper two clusters, and lower two clusters of the UMAP respectively.
Here is the genelist of the genes shared between the Upper and Lower clusters, in terms of DTA effect.
Upper_diff <- subset(oligos.integrated.samplediffDTA2vs1RNA, p_val_adj < 0.01 & abs(avg_logFC) > 0.25)
Lower_diff <- subset(oligos.integrated.samplediffDTA3vs5RNA, p_val_adj < 0.01 & abs(avg_logFC) > 0.25)
AlldiffgenesHetMOL5 <- unique(c(row.names(Upper_diff),row.names(Lower_diff)))
subset2 <- oligos.integrated.samplediffDTA2vs1RNA[AlldiffgenesHetMOL5,]
subset3 <- oligos.integrated.samplediffDTA3vs5RNA[AlldiffgenesHetMOL5,]
subsetMOL5 <- cbind(subset2,subset3)
colnames(subsetMOL5) <- make.unique(colnames(subsetMOL5))
diffmatrix <- subsetMOL5
diffmatrix$log_p_val <- -log10(diffmatrix$p_val_adj)
q95pgenes1 <- row.names(diffmatrix[which(diffmatrix$log_p_val >= quantile(diffmatrix$log_p_val,0)),])
diffmatrix$log_p_val.1 <- -log10(diffmatrix$p_val_adj.1)
q95pgenes2 <- row.names(diffmatrix[which(diffmatrix$log_p_val.1 >= quantile(diffmatrix$log_p_val.1,0)),])
q95pgenes <- unique(c(q95pgenes1,q95pgenes2))
diffmatrix <- diffmatrix[q95pgenes,]
# diffmatrix$avg_logFC <- runif(nrow(diffmatrix),min=0,max=100)
# diffmatrix$avg_logFC <- diffmatrix$avg_logFC-mean(diffmatrix$avg_logFC)
# diffmatrix$avg_logFC.1 <- runif(nrow(diffmatrix),min=0,max=100)
# diffmatrix$avg_logFC.1 <- diffmatrix$avg_logFC.1-mean(diffmatrix$avg_logFC.1)
diffmatrix$logFCsumsubstract <- diffmatrix$avg_logFC-diffmatrix$avg_logFC.1
diffmatrix$logFCsum <- diffmatrix$avg_logFC.1+diffmatrix$avg_logFC
diffmatrix$pvalsum <- diffmatrix$log_p_val-diffmatrix$log_p_val.1
diffmatrix$maxp <- apply(cbind(diffmatrix$log_p_val,diffmatrix$log_p_val.1),1,function(x) max(x))
diffmatrix$maxp[is.infinite(diffmatrix$maxp)] <- max(diffmatrix$maxp[!is.infinite(diffmatrix$maxp)])
diffmatrix$maxFC <- apply(cbind(diffmatrix$avg_logFC,diffmatrix$avg_logFC.1),1,function(x) max(x))
diffmatrix <- diffmatrix[order(diffmatrix$logFCsum,decreasing = TRUE),]
diffmatrix$order <- seq_len(nrow(diffmatrix))
diffmatrix$Genes <- factor(row.names(diffmatrix),levels=row.names(diffmatrix))
ggplot(diffmatrix,aes(logFCsumsubstract,y=logFCsum,colour=maxp,label=row.names(diffmatrix)))+ geom_point(size=diffmatrix$maxp/75) +scale_colour_gradient(low = "black", high = "red") + geom_text_repel(data=subset(diffmatrix, maxp > quantile(diffmatrix$maxp,0.8) & abs(avg_logFC) > 0), label=row.names(subset(diffmatrix, maxp > quantile(diffmatrix$maxp,0.8) & abs(avg_logFC) > 0)))+xlab("log2_FC") + ylab("-log10_p-value")
#+geom_density2d() #+ geom_hline(yintercept=-log10(0.01),linetype="dashed",size=0.8)
FeaturePlot(oligos.integrated, c("Tmsb4x","Mag","Ppia","Enpp2","Cd81","Apod","Mag","Ywhaq","Qk"),pt.size = 0.1)
FeaturePlot(oligos.integrated, c("Tmsb4x","Tpt1","Fth1","Cnp","Cldn11","Itm2b","Lamp1","Trf"),pt.size = 0.1)
library(ggplot2)
library(scales)
theme_set(theme_classic())
# Plot
ggplot(diffmatrix, aes(x=Genes, y=logFCsum)) +
geom_point(col="tomato2", size=abs(diffmatrix$maxFC*10)) + # Draw points
geom_segment(aes(x=Genes,
xend=Genes,
y=min(logFCsum),
yend=max(logFCsum)),
linetype="dashed",
size=0.1) + # Draw dashed lines
labs(title="MOL5 Upper Vs Lower")+
coord_flip()
Alldiff <- rbind(Upper_diff,Lower_diff)
Alldiff$Gene <- row.names(Alldiff)
Shared <- intersect(row.names(Upper_diff),row.names(Lower_diff))
Upper_specific <- setdiff(row.names(Upper_diff),row.names(Lower_diff))
Lower_specific <- setdiff(row.names(Lower_diff),row.names(Upper_diff))
Shared
Lets have a closer look at the proteins expressed by the genes in these gene lists.
For this we will use the STRING database, although this is only compatible with the version 10 database. In the folder I have added the version 11 analysis, which is far more detailed, and I have manually added expression information in the way of colored halos around the genes, which I will come back to later in the analysis.
library("STRINGdb")
string_db <- STRINGdb$new( version="10", species=10090, score_threshold=400, input_directory="" )
LowerDTA <- Lower_diff
LowerDTA$Gene <- row.names(LowerDTA)
UpperDTA <- Upper_diff
UpperDTA$Gene <- row.names(UpperDTA)
DTAOL <- rbind(LowerDTA,UpperDTA)
DTAOL_mapped <- string_db$map( DTAOL, "Gene", removeUnmappedRows = TRUE )
hits <- DTAOL_mapped$STRING_id
string_db$plot_network( hits )
Now we make a new object and use only the OL found DTA genes to make a tsne and UMAP and cluster them.
Generating the UMAP and TSNE.
If these genes are describing some OL process, maturation or functional it would be interesting to see how well the 151 differentially expressed genes describe OL heterogeneity.
featuresDTA <- unique(c(row.names(subset(oligos.integrated.samplediffDTA2vs1RNA, p_val_adj < 0.01 & abs(avg_logFC) > 0.25)),row.names(subset(oligos.integrated.samplediffDTA3vs5RNA, p_val_adj < 0.01 & abs(avg_logFC) > 0.25))))
oligos.integratedDTA <- RunPCA(oligos.integrated, verbose = FALSE,features=featuresDTA)
ElbowPlot(oligos.integratedDTA)
oligos.integratedDTA <- RunUMAP(oligos.integratedDTA, dims = 1:11)
oligos.integratedDTA <- RunTSNE(oligos.integratedDTA, dims = 1:11)
plots <- DimPlot(oligos.integratedDTA, group.by = c("Sample"), combine = FALSE)
plots <- lapply(X = plots, FUN = function(x) x + theme(legend.position = "top") + guides(color = guide_legend(nrow = 3,
byrow = TRUE, override.aes = list(size = 3))))
CombinePlots(plots)
plots <- TSNEPlot(oligos.integratedDTA, group.by = c("Sample"), combine = FALSE)
plots <- lapply(X = plots, FUN = function(x) x + theme(legend.position = "top") + guides(color = guide_legend(nrow = 3,
byrow = TRUE, override.aes = list(size = 3))))
CombinePlots(plots)
oligos.integratedDTA <- FindNeighbors(oligos.integratedDTA, dims = 1:11)
oligos.integratedDTA <- FindClusters(oligos.integratedDTA,algorithm = 4,resolution = 0.6)
DimPlot(oligos.integratedDTA, group.by = c("seurat_clusters"), combine = FALSE)
DimPlot(oligos.integratedDTA, group.by = c("predicted.id"), combine = FALSE)
DimPlot(oligos.integratedDTA, group.by = c("Sample"), combine = FALSE)
From the UMAP it seems that these 151 genes do allow us to separate the major OL clusters, and the UMAP seems to place the OPC-COP-NFOL1-NFOL2 progression correctly.
The DTA genes also seem to allow us to cluster the MOLs and even the OPCs.
oligos.integratedDTA.markers %>% group_by(cluster) %>% top_n(n = 2, wt = avg_logFC)
Below follows the heatmap showing the top 10 genes based on fold change for each cluster.
DefaultAssay(oligos.integratedDTA) <- "SCT"
top10 <- oligos.integratedDTA.markers %>% group_by(cluster) %>% top_n(n = 10, wt = avg_logFC)
DoHeatmap(oligos.integratedDTA, features = top10$gene) + NoLegend()
And here are the top 2 genes found for each cluster as shown on the UMAP.
DefaultAssay(oligos.integratedDTA) <- "SCT"
# Normalize RNA data for visualization purposes
#oligos.integrated <- NormalizeData(oligos.integrated, verbose = FALSE)
top2 <- oligos.integratedDTA.markers %>% group_by(cluster) %>% top_n(n = 2, wt = avg_logFC)
FeaturePlot(oligos.integratedDTA, features = top2$gene,pt.size = 0.1)
DefaultAssay(oligos.integratedDTA) <- "SCT"
Here I show expression of some common DTA genes that I know are supposed to be more or less affected, based on the differential expression, and the connectedness in the STRINGdb network.
DefaultAssay(oligos.integratedDTA) <- "SCT"
# Normalize RNA data for visualization purposes
#oligos.integrated <- NormalizeData(oligos.integrated, verbose = FALSE)
FeaturePlot(oligos.integratedDTA, c("Itm2b","App", "Mapt","Trf","Ywhaq","Kif1b","Tuba1a", "Dync1h1","Dst","Ank2", "Ank3", "Nfasc","Cntn2","Tppp","Ncam1","Mbp","Car2","Ubb","Prdx1","Fth1","Vdac2","Atp5f1","Sepp1","Hopx","Opalin","Ptgds","Il33"),pt.size = 0.1)
DefaultAssay(oligos.integratedDTA) <- "SCT"
For reference and to check what the expression of healthy OLs should look like for these DTA genes, we make a new object of the Science dataset (without OPC and COP) and use only the DTA genes to make a tsne and UMAP and cluster them, and get the markers. Generating the dataset, UMAP, and TSNE.
oligos.integratedScience <- subset(Science,cell_class %in% c("NFOL1","NFOL2","MFOL1","MFOL2","MOL1","MOL2","MOL3","MOL4","MOL5","MOL6"))
oligos.integratedScience <- RunPCA(oligos.integratedScience, verbose = FALSE,features=featuresDTA)
ElbowPlot(oligos.integratedScience)
oligos.integratedScience <- RunUMAP(oligos.integratedScience, dims = 1:13,n.neighbors = 20)
#oligos.integratedScience <- RunTSNE(oligos.integratedScience, dims = 1:10)
plots <- DimPlot(oligos.integratedScience, group.by = c("cell_class"), combine = FALSE)
plots <- lapply(X = plots, FUN = function(x) x + theme(legend.position = "top") + guides(color = guide_legend(nrow = 3,
byrow = TRUE, override.aes = list(size = 3))))
CombinePlots(plots)
# plots <- TSNEPlot(oligos.integratedScience, group.by = c("cell_class"), combine = FALSE)
# plots <- lapply(X = plots, FUN = function(x) x + theme(legend.position = "top") + guides(color = guide_legend(nrow = 3,
# byrow = TRUE, override.aes = list(size = 3))))
# CombinePlots(plots)
oligos.integratedScience <- FindNeighbors(oligos.integratedScience, dims = 1:13)
oligos.integratedScience <- FindClusters(oligos.integratedScience,algorithm = 4,resolution = 0.6)
DimPlot(oligos.integratedScience, group.by = c("seurat_clusters"), combine = FALSE)
DimPlot(oligos.integratedScience, group.by = c("cell_class"), combine = FALSE)
oligos.integratedScience.markers %>% group_by(cluster) %>% top_n(n = 2, wt = avg_logFC)
Below follows the heatmap showing the top 10 genes based on fold change for each cluster.
DefaultAssay(oligos.integratedScience) <- "SCT"
top10 <- oligos.integratedScience.markers %>% group_by(cluster) %>% top_n(n = 10, wt = avg_logFC)
DoHeatmap(oligos.integratedScience, features = top10$gene) + NoLegend()
And here are the top 2 genes found for each cluster as shown on the UMAP.
DefaultAssay(oligos.integratedScience) <- "SCT"
# Normalize RNA data for visualization purposes
#oligos.integrated <- NormalizeData(oligos.integrated, verbose = FALSE)
top2 <- oligos.integratedScience.markers %>% group_by(cluster) %>% top_n(n = 2, wt = avg_logFC)
FeaturePlot(oligos.integratedScience, features = top2$gene,pt.size = 0.1)
DefaultAssay(oligos.integratedScience) <- "SCT"
Here I show expression of some common DTA genes that I know are supposed to be more or less affected, to compared them with the expression of the DTA dataset above.
DefaultAssay(oligos.integratedScience) <- "SCT"
# Normalize RNA data for visualization purposes
#oligos.integrated <- NormalizeData(oligos.integrated, verbose = FALSE)
FeaturePlot(oligos.integratedScience, c("Itm2b","Scd2","App", "Mapt","Trf","Ywhaq","Kif1b","Tuba1a", "Dync1h1","Dst","Ank2", "Ank3", "Nfasc","Cntn2","Tppp","Ncam1","Mbp","Car2","Ubb","Prdx1","Fth1","Vdac2","Atp5f1","Sepp1","Hopx","Opalin","Ptgds","Il33","Serpinb1a","Hapln2","Rab37"),pt.size = 1)
DefaultAssay(oligos.integratedScience) <- "SCT"
Now we will start to analyse the DTA genes that we found earlier. Here I chose the reactome database, because it seems to give me actual pathways that might be affected.
library(clusterProfiler)
#Convert to gencode using biomart
library(biomaRt)
DTAOL <- subset(oligos.integrated.samplediffMOL5RNA, p_val_adj < 0.01 & abs(avg_logFC) > 0.1)
DTAOL <- subset(oligos.integrated.samplediffDTA23vs15RNA, p_val_adj < 0.01 & abs(avg_logFC) > 0)
DTAOL <- subset(oligos.integrated.samplediffOPCRNA, p_val_adj < 0.01 & abs(avg_logFC) > 0.1)
DTAOL$Gene <- row.names(DTAOL)
DTAOL <- Alldiff
genemodulesGO <- DTAOL[!duplicated(DTAOL$Gene),]
listMarts()
ensembl = useMart("ensembl",dataset="mmusculus_gene_ensembl")
listDatasets(ensembl)
attributes = listAttributes(ensembl)
Biomart_gencode_ensembl84_biotypes <- getBM(attributes=c("mgi_symbol","ensembl_gene_id","entrezgene_id","gene_biotype"), filters = "", values = "", ensembl)
Biomart_gencode_ensembl84_biotypes[, 'gene_biotype'] <- as.factor(Biomart_gencode_ensembl84_biotypes[,'gene_biotype'])
#Filter for only our genes
Biotype_All_dataset <- subset(Biomart_gencode_ensembl84_biotypes, mgi_symbol %in% oligos.integrated@assays$SCT@var.features)
entrezID <- subset(Biotype_All_dataset, Biotype_All_dataset$mgi_symbol %in% oligos.integrated@assays$SCT@var.features)
entrezmatched <- entrezID[match(genemodulesGO$Gene,entrezID$mgi_symbol),]
entrezID <- entrezID[! apply(entrezID[,c(1,3)], 1,function (x) anyNA(x)),]
allLLIDs <- entrezmatched$entrezgene
library(clusterProfiler)
#Convert to gencode using biomart
library(biomaRt)
#Subset the differential expression genelist from a seurat diff expression result with the parameters you use.
DTAOL <- subset(oligos.integrated.samplediffMOL5RNA, p_val_adj < 0.01 & abs(avg_logFC) > 0.1)
DTAOL$Gene <- row.names(DTAOL)
#remove any duplicates (sanity check for me)
genemodulesGO <- DTAOL[!duplicated(DTAOL$Gene),]
#Convert to entrez
listMarts()
ensembl = useMart("ensembl",dataset="mmusculus_gene_ensembl")
listDatasets(ensembl)
attributes = listAttributes(ensembl)
Biomart_gencode_ensembl84_biotypes <- getBM(attributes=c("mgi_symbol","ensembl_gene_id","entrezgene_id","gene_biotype"), filters = "", values = "", ensembl)
Biomart_gencode_ensembl84_biotypes[, 'gene_biotype'] <- as.factor(Biomart_gencode_ensembl84_biotypes[,'gene_biotype'])
#Filter for only our genes
Biotype_All_dataset <- subset(Biomart_gencode_ensembl84_biotypes, mgi_symbol %in% oligos.integrated@assays$SCT@var.features)
entrezID <- subset(Biotype_All_dataset, Biotype_All_dataset$mgi_symbol %in% oligos.integrated@assays$SCT@var.features)
entrezmatched <- entrezID[match(genemodulesGO$Gene,entrezID$mgi_symbol),]
#you might need to remove NAs
##entrezID <- entrezID[! apply(entrezID[,c(1,3)], 1,function (x) anyNA(x)),]
allLLIDs <- entrezmatched$entrezgene
library(ReactomePA)
library(org.Mm.eg.db)
modulesReactomeOPC <- enrichPathway(gene=allLLIDs,organism="mouse",pvalueCutoff=0.01,qvalueCutoff = 0.3,pAdjustMethod = "none", readable=T)
head(as.data.frame(modulesReactome))
dotplot(modulesReactome, showCategory=20)
emapplot(modulesReactome)
FC <- genemodulesGO$avg_logFC
names(FC) <- genemodulesGO$Gene
cnetplot(modulesReactome, showCategory = 20,categorySize="pvalue", foldChange=FC,colorEdge = T,node_label=T,circular = F)
library(clusterProfiler)
#Convert to gencode using biomart
library(biomaRt)
OPC_diff
OPC_diff$Gene <- row.names(OPC_diff)
genemodulesGO <- OPC_diff
listMarts()
ensembl = useMart("ensembl",dataset="mmusculus_gene_ensembl")
listDatasets(ensembl)
attributes = listAttributes(ensembl)
Biomart_gencode_ensembl84_biotypes <- getBM(attributes=c("mgi_symbol","ensembl_gene_id","entrezgene_id","gene_biotype"), filters = "", values = "", ensembl)
Biomart_gencode_ensembl84_biotypes[, 'gene_biotype'] <- as.factor(Biomart_gencode_ensembl84_biotypes[,'gene_biotype'])
#Filter for only our genes
Biotype_All_dataset <- subset(Biomart_gencode_ensembl84_biotypes, mgi_symbol %in% oligos.integrated@assays$SCT@var.features)
entrezID <- subset(Biotype_All_dataset, Biotype_All_dataset$mgi_symbol %in% oligos.integrated@assays$SCT@var.features)
entrezmatched <- entrezID[match(genemodulesGO$Gene,entrezID$mgi_symbol),]
entrezID <- entrezID[! apply(entrezID[,c(1,3)], 1,function (x) anyNA(x)),]
allLLIDs <- entrezmatched$entrezgene
library(ReactomePA)
library(org.Mm.eg.db)
modulesReactome <- enrichPathway(gene=allLLIDs,organism="mouse",pvalueCutoff=0.01,qvalueCutoff = 0.3,pAdjustMethod = "none", readable=T)
head(as.data.frame(modulesReactome))
dotplot(modulesReactome, showCategory=8)
emapplot(modulesReactome)
FC <- genemodulesGO$avg_logFC
names(FC) <- genemodulesGO$Gene
cnetplot(modulesReactome, showCategory = 8,categorySize="pvalue", foldChange=FC)
Now to transfer the labels of the Science dataset, we will integrate all the datasets together and try to predict the cluster membership of the 10X data. (Some code is hidden to keep it streamlined)
Generating the UMAP and TSNE, for the integrated dataset with the Science dataset.
oligos.integrated.full <- RunPCA(oligos.integrated.full, verbose = FALSE)
ElbowPlot(oligos.integrated.full)
oligos.integrated.full <- RunUMAP(oligos.integrated.full, dims = 1:30)
#oligos.integrated <- RunTSNE(oligos.integrated, dims = 1:30)
plots <- DimPlot(oligos.integrated.full, group.by = c("Sample"), combine = FALSE)
plots <- lapply(X = plots, FUN = function(x) x + theme(legend.position = "top") + guides(color = guide_legend(nrow = 3,
byrow = TRUE, override.aes = list(size = 3))))
CombinePlots(plots)
# plots <- TSNEPlot(oligos.integrated, group.by = c("Sample"), combine = FALSE)
# plots <- lapply(X = plots, FUN = function(x) x + theme(legend.position = "top") + guides(color = guide_legend(nrow = 3,
# byrow = TRUE, override.aes = list(size = 3))))
# CombinePlots(plots)
Here I repeat the plotting of expression of some common genes that I know are supposed to be more or less stable clusters within the OLs, just for reference.
DefaultAssay(oligos.integrated.full) <- "RNA"
#Normalize RNA data for visualization purposes
oligos.integrated.full <- NormalizeData(oligos.integrated.full, verbose = FALSE)
FeaturePlot(oligos.integrated.full, c("Pdgfra", "Top2a","Ptprz1","Bmp4","Itpr2", "Egr1", "Klk6", "Hopx", "Ptgds","Il33"),pt.size = 0.1)
DefaultAssay(oligos.integrated.full) <- "integrated"
Here I set the clustering resolution high enough to include the COPs,
this means that the MOLs are broken into more clusters than in the
science paper.
I show the clusters on the UMAP so you can see their position.
oligos.integrated.full <- FindNeighbors(oligos.integrated.full, dims = 1:30)
oligos.integrated.full <- FindClusters(oligos.integrated.full,algorithm = 4,resolution = 0.6)
DimPlot(oligos.integrated.full, group.by = c("seurat_clusters"), combine = FALSE)
Below you will find a table of the top 2 markers found for each cluster. pct means percentage of expression, where pct.2 refers to all the cells not in the tested cluster.
oligos.integrated.markers %>% group_by(cluster) %>% top_n(n = 2, wt = avg_logFC)
Below follows the heatmap showing the top 10 genes based on fold change for each cluster, using the Seurat found cluster information.
DefaultAssay(oligos.integrated.full) <- "integrated"
top10 <- oligos.integrated.markers %>% group_by(cluster) %>% top_n(n = 10, wt = avg_logFC)
DoHeatmap(oligos.integrated.full, features = top10$gene) + NoLegend()
And here are the top 2 genes found for each cluster as shown on the UMAP.
DefaultAssay(oligos.integrated.full) <- "RNA"
# Normalize RNA data for visualization purposes
oligos.integrated.full <- NormalizeData(oligos.integrated.full, verbose = FALSE)
top2 <- oligos.integrated.markers %>% group_by(cluster) %>% top_n(n = 2, wt = avg_logFC)
FeaturePlot(oligos.integrated.full, features = top2$gene,pt.size = 0.1)
DefaultAssay(oligos.integrated.full) <- "integrated"
Now we attempt to transfer the cluster labels of the Science dataset onto the 10X dataset.
DefaultAssay(oligos.integrated.full) <- "integrated"
oligos.query <- oligos.list
for (i in 1:(length(oligos.query)-1)) {
print(paste("Working on dataset ",i," of",length(oligos.list)))
oligos.anchors <- FindTransferAnchors(reference = oligos.integrated.full, query =oligos.list[[i]],
dims = 1:13,reduction = "cca")
predictions <- TransferData(anchorset = oligos.anchors, refdata = oligos.integrated.full$cell_class,
dims = 1:13,weight.reduction = "cca")
oligos.query[[i]] <- AddMetaData(oligos.list[[i]], metadata = predictions)
}
predicted.cellclass <- as.character()
for (i in 1:(length(oligos.query)-1)) {
predicted.cellclass <- c(predicted.cellclass,oligos.query[[i]]$predicted.id)
}
predicted.cellclass <- c(predicted.cellclass,oligos.integrated.full$cell_class[!is.na(oligos.integrated.full$cell_class)])
table(names(predicted.cellclass)==colnames(oligos.integrated.full))
oligos.integrated.full@meta.data$predicted.cellclass <- predicted.cellclass
Here is the end result projected on the UMAP.
DimPlot(oligos.integrated.full, group.by = c("predicted.cellclass"), combine = FALSE)
DiffMatrix <- list()
diffmatrixnames <- c("oligos.integrated.samplediffAllRNA",
"oligos.integrated.samplediffMOL5RNA",
"oligos.integrated.samplediffOPCRNA",
"oligos.integrated.samplediffDTA2vs3RNA")
do.call(head,as.list(as.name(diffmatrixnames[1])))
library(xlsx)
library(stringr)
setwd("~/Documents/SingleCellData/Networkclustering/ElisaAnalysis/IntegrationDTAnetwork/Figures/")
file <- paste("DifferentialExpression.xlsx", sep = "")
for(i in 1:length(diffmatrixnames)){
if(i==1){
write.xlsx(as.data.frame(get(diffmatrixnames[i])), file, sheetName = str_sub(diffmatrixnames[i],start=-10)) }
if(i>1){
write.xlsx(as.data.frame(get(diffmatrixnames[i])), file, sheetName = str_sub(diffmatrixnames[i],start=-10), append = TRUE)
}
}
DiffMatrix <- list()
diffmatrixnames <- c("modulesReactomeMOL5",
"modulesReactomeOPC")
do.call(head,as.list(as.name(diffmatrixnames[1])))
diffmatrixnames <- c("modulesReactomeMOL5",
"modulesReactomeOPC")
library(xlsx)
library(stringr)
setwd("~/Documents/SingleCellData/Networkclustering/ElisaAnalysis/IntegrationDTAnetwork/Figures/")
file <- paste("Pathwayanalysis.xlsx", sep = "")
for(i in 1:length(diffmatrixnames)){
if(i==1){
write.xlsx(as.data.frame(get(diffmatrixnames[i])), file, sheetName = str_sub(diffmatrixnames[i],start=-10)) }
if(i>1){
write.xlsx(as.data.frame(get(diffmatrixnames[i])), file, sheetName = str_sub(diffmatrixnames[i],start=-10), append = TRUE)
}
}
setwd("~/Documents/SingleCellData/Networkclustering/ElisaAnalysis/IntegrationDTAnetwork/")
write.xlsx(oligos.integrated.markers,file="Enrichedgenespercluster.xlsx", sheetName = "EnrichedClustersWilcoxon")
To make some sense of this I have annotated the STRINGdb network with up/down regulation. Red means down in DTA and green up in DTA.